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Abstract 

We present extensions to the Ambisonic Decoder 
Toolbox to efficiently design periphonic decoders for 
non-uniform speaker arrays such as hemispherical 
domes and multilevel rings. These techniques include 
modified inversion, A11RAD, and spherical Slepian 
function-based decoders. We also describe a new 
backend for the toolbox that writes out full-featured 
decoders in the Faust DSP specification language, 
which can then be compiled into a variety of plug¬ 
in formats. Informal listening tests and performance 
measurements indicate that these decoders work well 
for speaker arrays that are difficult to handle with 
conventional design techniques. The computation is 
relatively quick and more reliable compared to non¬ 
linear optimization techniques used previously. 
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1 Introduction 

This is a paper about extensions to the Ambi- 
sonic Decoder Toolbox to efficiently design deco¬ 
ders for loudspeaker arrays with partial coverage 
of the sphere, such as domes and multilevel rings. 
The criteria for Ambisonic reproduction are: 

• Constant amplitude and energy gain for all 
source directions 

• At low frequencies, reproduced wavefront 
direction and velocity are correct 

• At high frequencies, maximum concentra¬ 
tion of energy in the source direction 

• Matching high- and low-frequency perceived 
directions 

In the case of decoders for partial-coverage 
arrays, we relax these to apply only to source 
directions that are within the covered part of the 
sphere, but still require that the decoder be “well 
behaved” for sources from other directions. 

Conventional techniques for periphonic deco¬ 
der design work well when the speakers are dis¬ 
tributed uniformly around the listening position. 


First-order Ambisonics can be accommodated in 
many listening rooms; however, when moving 
to higher-order reproduction the need arises to 
place more loudspeakers below the listener. This 
requires placing the listening position high in 
the room or on an acoustically transparent floor 
with a space below to install speakers. Neither 
of these are practical for most installations, so 
hemispherical dome configurations are a popular 
alternative. In addition, it may be impractical 
to install speakers directly overhead, resulting 
in a configuration of horizontal rings of speakers 
at multiple heights. These configurations leave 
gaps in coverage below, and possibly above, the 
listening position. 

In a previous paper, we describe a Mat- 
LAB/GNU Octave 1 toolbox for generating Ambi¬ 
sonic decoders that uses inversion or projection 
to generate an initial estimate and then non¬ 
linear optimization to simultaneously maximize 
ve and minimize directional and loudness errors 
[2012]. While this works well for small arrays, we 
found that increasing the Ambisonic order and 
number of loudspeakers causes the optimizer to 
converge slowly and get stuck in local minima 
unless the starting solution is close to optimal. 2 

In the case of hemispherical domes and mul¬ 
tilevel rings, neither inversion or projection pro¬ 
vide a close starting point. Once the speaker 
array deviates from uniform geometry, an in¬ 
version decoder will trade uniform loudness for 
directional accuracy by putting more energy in 
directions where gaps between the loudspeakers 
are larger. A projection decoder does just the 
opposite, putting equal energy into all the speak- 

1 In this paper, we use “Matlab” to refer to both 
Matlab and GNU Octave. Care has been taken to 
make sure the code runs in both; however, not all of the 
graphics work well in Octave. Matlab is a registered 
trademark of The MathWorks, Inc. 

2 A recent paper by Arteaga [2013] takes advantage 
of symmetries in the loudspeaker array and a reformula¬ 
tion of the objective function to improve the convergence 
behavior of the optimization process. 



ers regardless of spacing, hence they are louder 
in directions where there are more speakers. In 
practice, neither provides an adequate starting 
point for the optimization process. 

The general problem is that it is difficult to 
pull the sound image beyond the space where 
there is dense coverage. For the case of hemi¬ 
spheres this not only means that performance 
will suffer below the horizon, but that it will be 
poor at the horizon. Because horizontal perfor¬ 
mance is uniquely important, it is necessary to 
make the decoder perform well there, despite the 
difficulties. 

New design techniques have been proposed 
over the last few years to handle these sorts of ar¬ 
rays. We have implemented these in the toolbox 
to make them available to a wider user group. 
The toolbox has been extended beyond third- 
order decoding, and to support component order 
and normalization conventions other than Furse- 
Malham. We also wanted to support a variety of 
plug-in architectures. A new decoder engine was 
written in the Faust (Functional Audio Stream) 
DSP Specification language [Orlarey, Fober, and 
Letz 2009; Smith 2013a], which includes facilities 
for dual-band decoding, and near-field, distance, 
and loudness compensation. 

1.1 Auditory Localization 

In this paper we utilize Gerzon’s two main local¬ 
ization models to predict decoder performance: 
the velocity localization vector, r-y, and the en¬ 
ergy localization vector, rg- These are defined 
and discussed in our previous paper on the tool¬ 
box [Heller, Benjamin, and Lee 2012] (and many 
other places). Briefly, these models encapsulate 
the primary interaural time difference (ITD) and 
interaural level difference (ILD) theories of audi¬ 
tory localization. The direction of each indicates 
the direction of the localization perception, and 
the magnitude indicates the quality of the local¬ 
ization. In natural hearing from a single source, 
the magnitude of each is exactly 1 and the direc¬ 
tion is the direction to the source. 

1.2 Math Notation 

We use lowercase bold roman type to denote vec¬ 
tors (v), uppercase bold roman type to denote 
matrices (M), italic type to denote scalars (s), 
and sans serif type to denote signals (W). A 
scalar with the same name as a vector denotes 
the magnitude of the vector. A vector with a 
circumflex (“hat”) is a unit vector, so, for exam¬ 
ple, te = te/te. “At” is the Moore-Penrose 
pseudoinverse of A (pinv(A) in Matlab) and 


“A t ” is the transpose of A (A. ’ in Matlab). 

2 Decoder Design Techniques for 
Domes and Multilevel Rings 

In Ambisonics, the standard technique for deri¬ 
ving the basic decoder matrix, M, is to invert 
the matrix, K, whose columns are composed of 
the spherical harmonics sampled at the speaker 
positions, such that M K = I, where I is the 
identity matrix [Gerzon 1980; Heller, Lee, and 
Benjamin 2008]. 3 

Because K is “encoding” the speaker positions, 
some authors call it the reencoding matrix and 
refer to the inversion as mode matching. In the 
general case, K is rank deficient, so the inver¬ 
sion must be done by least-squares or by us¬ 
ing singular-value decomposition (SVD) and the 
Moore-Penrose pseudoinverse. 

Problems arise when a given loudspeaker array 
does a poor job of sampling some of the spheri¬ 
cal harmonics, such as sampling at or near zero 
crossings or having more than one zero crossing 
between samples. In these cases, K will be ill- 
conditioned (difficult to invert without loss of 
precision) and the resulting decoder will have 
greater energy gain in certain directions, result¬ 
ing in reduced ve and greater loudness in those 
directions. 

In the following subsections, we discuss three 
strategies implemented in the toolbox: 

• Use an inversion technique suited to ill- 
conditioned problems 

• Invert a well-behaved full-sphere coverage 
array, map to the real array 

• Derive a new set of basis functions for which 
the inversion is well behaved 

2.1 Modified Inversion 

One proposed solution is to set all of the singular 
values to 1 when computing the pseudoinverse 
[Pomberger and Zotter 2012], This has the ef¬ 
fect of diminishing the use of the poorly sam¬ 
pled spherical harmonics. The resulting decoder 
has constant energy (hence, loudness) in all di¬ 
rections, at the expense of increased directional 
errors. 

Another solution is to use a truncated SVD 
when computing the pseudoinverse. This simply 
discards the poorly sampled spherical harmon¬ 
ics. In the conventional pseudoinverse (e.g., as 

3 The term sampling is used here to mean evaluating 
the given spherical harmonic function at a particular 
azimuth and elevation. 



implemented in Matlab), normalized singular 
values 4 less than 10 -1 ' 5 are not inverted. In a 
truncated SVD, a much larger threshold is used. 
For example, setting the threshold to ^ puts an 
upper limit of 3 dB on the loudness variations, 
again, at the expense of increased directional 
errors. 

The toolbox also can produce decoders that 
are a linear combinations of conventional pseu¬ 
doinverse and these alternatives, providing a sin¬ 
gle parameter to tradeoff uniform loudness and 
directional accuracy. Other approaches to in¬ 
verting ill-conditioned matrices have been ap¬ 
plied to this problem, such as Tikhonov regu¬ 
larization [Poletti 2005] and LASSO (least ab¬ 
solute shrinkage and selection operator) [Chen 
and Huang 2013]. Currently, we have not imple¬ 
mented these, although the linear combination 
approach described above provides a result sim¬ 
ilar to Tikhonov regularization. 

2.2 Hybrid Ambisonic-VBAP Decoding 

The hybrid Ambisonic-VBAP approach is called 
“All Round Ambisonic Decoding” (A11RAD) by 
Zotter and Frank [2012]. Briefly, one computes 
a decoder for a uniform array of virtual speakers 
and then maps the signals for the virtual array 
to the real loudspeaker array using Vector Base 
Amplitude Panning (VBAP) [Pulkki 1997]. 

VBAP always produces the smallest possible 
angular spread of energy for a given panning di¬ 
rection and speaker array, hence the perceived 
size of a virtual source changes depending on di¬ 
rection. This is directly at odds with the Ambi¬ 
sonic approach, which tries to keep the perceived 
size of a virtual source constant regardless of 
source direction. A11RAD uses two strategies to 
mitigate this: 

1. The number of virtual speakers is made 
much larger than the number of real speak¬ 
ers. 

2. Imaginary speakers are inserted to fill in 
large gaps in the real loudspeaker array in 
order to keep the triangular faces of the tes¬ 
sellation as regular as possible. 

A11RAD places the virtual speakers according 
to a spherical t-design [Hardin and Sloane 2002], 
A spherical t-design of degree t is a finite set of 
points on a sphere, such that the integral of any 
polynomial of degree t or less over the sphere 
is equal to the average value of the polynomial 

4 the set of singular values divided by the largest one 


Speaker Locations for CCRM A Listening Room Dome 



Figure 1: Plot of real speaker locations for the up¬ 
per hemisphere in CCRMA’s Listening Room (black 
hexagrams), unit sphere tessellation, and intersection 
points of 240 virtual speaker directions (green plus 
sign). The speaker at the bottom is an imaginary 
speaker added to keep the facets of the tessellation as 
regular as possible. The location of the intersection 
points are used to calculate the VBAP gains to the 
real speakers. 

sampled at the points in the set. The present 
implementation uses the 240-point spherical t- 
design for the virtual array, which is the largest 
currently-known t-design. 

There are three steps to the design of an All- 
RAD decoder: 

1. Select a spherical t-design for the array of 
virtual speakers and compute a decoder for 
it. Because the virtual speakers are dis¬ 
tributed uniformly on the sphere the inver¬ 
sion is well behaved. 

(a) Compose the matrix Ky whose 
columns are the spherical harmonics 
sampled at the directions of the virtual 
speakers. 

(b) Compute the decoder matrix for the 
virtual array, My = K\H- 

2. Compute the matrix of VBAP gains for each 
virtual speaker. 

(a) Project the positions of the real speak¬ 
ers onto the unit sphere. 

(b) Add imaginary speakers to the array to 
fill in any gaps larger than 90°. For a 
dome this will be one at the bottom. 
For a multilevel ring, one at the top 
and one at the bottom. The distance 
from the center determines how quickly 
















































Figure 2: The A11RAD decoder’s performance for the upper hemisphere of CCRMA’s Listening Room. 
These show the (a) energy concentration, (b) directional accuracy, and (c) loudness of sources from various 
directions. Directional errors are clipped at 10° so that smaller errors can be seen. The plots have been 
quantized to make the structure clearer. Note that the Mercator projection used overemphasizes the poles. 


sources fade as they move outside the 
region of the sphere covered by the real 
speaker array. 

(c) Compute the triangular tessellation of 
the convex hull of the projected speaker 
positions. 

(d) Determine the intersection point of the 
vector to each virtual speaker with the 
faces of the convex hull. 

(e) Calculate the barycentric coordinates 
of each intersection point. These are 
the VBAP gains from that virtual 
speaker to the three real speakers at 
the vertices of the face. 

(f) Assemble the matrix of the VBAP 
gains, Gy->R. This matrix has one col¬ 
umn for each virtual speaker and one 
row for each real speaker. Each col¬ 
umn will have up to three gains for that 
virtual speaker from the previous step. 
Gains to imaginary speakers are omit¬ 
ted. 

3. The basic decoder matrix is 

M = Gv-,R My. 

Figure 1 shows the real and imaginary speaker 
positions, the tessellation of the speaker direc¬ 
tions, and the intersection points of the vectors 
to each virtual speaker with the faces of the tes¬ 
sellation. The example shown is for the upper 
hemisphere of loudspeakers in CCRMA’s Listen¬ 
ing Room. Figure 2 shows the performance of 
the A11RAD decoder used in the listening tests. 


2.3 Spherical Slepian Function 
Decoding 

Spherical Slepian functions (SSF) are linear com¬ 
binations of spherical harmonics that produce 
new basis functions that are approximately zero 
outside the chosen region of the sphere, but 
also remain orthogonal within the region of in¬ 
terest. This makes them suitable for decom¬ 
posing spherical-harmonic models into portions 
that have significant energy only in selected ar¬ 
eas [Beggan et al. 2013; Simons, Dahlen, and 
Wieczorek 2006]. They have been used in satel¬ 
lite geodesy to model the magnetic and gravi¬ 
tational fields of the earth from satellite data 
that does not cover the whole earth. In design¬ 
ing Ambisonic decoders, they allow us to specify 
a region of interest on the sphere and derive a 
new set of basis functions that is well conditioned 
within that region. Zotter et al. call this “Energy- 
Preserving Ambisonic Decoding” (EPAD) [2012], 
The procedure implemented in the toolbox is 
described here. 

1. Define the subset of the surface of the sphere 
for the decoder, 7Z C S 2 , where S 2 denotes 
the surface of the unit sphere in M 3 . To 
assure good performance at the boundary, 
select it to be a bit larger than the area 
covered by the loudspeakers; for the decoder 
tested, we used —30° to 90° elevation. 

2. Compose the Gramian matrix, G, of the in¬ 
ner products of the real spherical harmonics, 

over the region 1Z. Each element, 
9im,i'm' ) of G is given by 

9lm,l'm' = (R/rri) 

= [ Y lm (0) Y Vm ,(0) dO 

Jn 























where Im is a single-index designator for the 
real spherical harmonic of degree l and order 
to. 6 = [cos0cos<?i sin 9 cos <j) sin(/>] T , 
and 9 and <f> are azimuth and elevation. 

3. Compute the eigen decomposition of G — ^ 
U A U -1 . U is a unitary matrix whose 
columns are the eigenvectors of G. The di¬ 
agonal elements of A are the corresponding 
eigenvalues. 

4. Compose a new matrix, Ussf> by selecting 
the columns of U with eigenvalues above 
some threshold, a. a should be approxi¬ 
mately the fraction of the sphere covered by 
the region of interest. For a hemispherical 
dome, we use a = This matrix trans¬ 
forms points in the spherical harmonic basis 
to points in the new SSF basis. 

5. Compose the speaker reencoding matrix, K, 
where the columns are the spherical harmon¬ 
ics sampled at each speaker direction. Trans¬ 
form it to the new basis, Kggp = UggF T K 

6. Compute the basic decoder matrix, M = 

KsgF* UggF T . 

Figure 3 shows balloon plots of the all 16 spher¬ 
ical Slepian basis functions for the region —30° to 
90° elevation on the sphere. Note that the first 
eight are concentrated in the upper hemisphere, 
the next two in the middle, and the last six in 
the lower hemisphere. The first 13 (those with 
A > |) were used for the third-order decoder 
we tested. One observation is that this method 
creates basis functions that have a clearer re¬ 
lationship with source directions, which is not 
possible for the spherical harmonics above first 
order. Figure 4 shows the performance of the 
SSF decoder used in the listening tests. 

2.4 Max-rg; Decoders 

The basic decoder matricies, M, calculated in 
the preceding sections, are transformed into 
max-re decoders by multiplying by a matrix, T, 
whose diagonal entries are the per-order gains 
that maximize re over the sphere. M max _ rB = 
MT. The calculation of these gains is discussed 
in the appendix of [Heller, Benjamin, and Lee 
20121 . 

3 In-situ Performance 
Measurements 

The Ambisonic decoder design philosophies dis¬ 
cussed above are generally intended to optimize 
the psychoacoustically based parameters of the 


Gerzon Energy Vector theory. It is expected that 
those parameters generally predict the subjec¬ 
tive performance of the system but, they are not 
the same as the parameters that directly predict 
what is heard by the listeners. We use measure¬ 
ments of the ITD and ILD to gauge the localiza¬ 
tion performance in actual systems. ITDs are 
known to predict localization of low-frequency 
sounds and ILDs are known to predict the local¬ 
ization of high-frequency sounds. 

A group of measurements were performed in 
CCRMA’s Listening Room at Stanford Univer¬ 
sity. 5 That room is equipped with 22 loudspeak¬ 
ers arranged as a horizontal ring of eight loud¬ 
speakers, rings of six loudspeakers at +40° and 
—50° elevation, and one loudspeaker each at the 
zenith and nadir. This allowed the option of ei¬ 
ther using the full spherical array or decoders 
designed specifically to drive the upper 15 loud¬ 
speakers as a hemisphere. One decoder was de¬ 
rived by using the A11RAD method and the other 
by using a SSF basis set. 

The ITDs and ILDs created by real systems 
were measured by using a dummy head to record 
test signals reproduced from a variety of di¬ 
rections. The test signals are ambisonically 
panned exponential sine sweeps from which the 
impulse response is computed from each direc¬ 
tion. Those impulse responses are binaural im¬ 
pulse responses, from which the ITDs and ILDs 
can be derived. 

The ITDs were calculated by band-pass fil¬ 
tering the impulse responses to the bandwidth 
of interest and comparing the time of arrival at 
the two ears of the dummy head. Performing 
the calculation at 192 kHz sample rate gives a 
time resolution of 5 ps. The measurement was 
repeated in each of the 37 directions at 10° inter¬ 
vals around the horizon, and for each of the three 
decoders being evaluated. The result is shown in 
Figure 5a. All three decoders provide a plausible 
ITD result. The significant differences occur at 
the sides. 

ILDs are considerably more complex than 
ITDs, with the major differences between the 
two ears occurring at frequencies above 1 kHz. 
As a simplification to make comparison easier, 
the ILD was calculated as an average level be¬ 
tween 1 to 4 kHz. As for the ITDs, ILD was 
calculated at 10° intervals around the horizon. 
The results are shown in Figure 5b. 

The three decoders produce substantially dif- 

5 https: //ccrma.stanford.edu/room-guides/ 
listening-room 




Figure 3: Balloon plots of all 16 spherical Slepian basis functions for the region —30° to 90° elevation on 
the sphere. Lobes with reversed polarity are shown in blue. Note that the first eight are concentrated in the 
upper hemisphere, the next two in the middle, and the last six in the lower hemisphere. The first 13 (A > I) 
were used for the third-order decoder we tested. 
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(b) i‘e Direction Error (degrees) 


(c) Energy Gain (dB) 


Figure 4: The Spherical Slepian function decoder’s performance. These show the (a) energy concentration, 
(b) directional accuracy, and (c) loudness of sources from various directions. Directional errors are clipped at 


10 °. 


ferent values of ILD for sounds coming from the 
sides. It should be noted that the high values of 
ILD come from cancellation of signals on the op¬ 
posite side of the head from the sound source by 
diffraction of sound traveling around the head. 

Because the results of the ITD, and partic¬ 
ularly the ILD measurements, are so complex 
the analysis of their effect is quite difficult and 
beyond the scope of the present paper. That 
analysis will be published in a subsequent pa¬ 
per. 

4 Listening tests 

We conducted informal (non-blind) listening 
tests of third-order, single-band max-rg A11RAD 
and SSF-based decoders using the 15 loudspeak¬ 
ers comprising the upper hemispherical dome in 
the Listening Room at Stanford’s CCRMA. The 
decoders computed by the toolbox were saved 
as ArnbDec configuration files and loaded into 
multiple instances of ArnbDec so that rapid com¬ 


parisons could be made. 

As a reference, we also listened to full- 
sphere playback of the test material over all 
22 loudspeakers in the Listening Room using 
the third-order, two-band, decoder described in 
the previous paper [Heller, Benjamin, and Lee 
2012]. Playback levels of all three decoders were 
matched by ear. 

The test material comprised two third-order 
recordings, a full-sphere mix by Jay Kadis, 
CCRMA’s audio engineer, of “Babel” by Allette 
Brooks 6 and Jorn Nettingmeier’s recording of 
Chroma XII by Rebecca Sanders [Nettingsmeier 
2012]. Playback was directly from the Ardour 
sessions for each piece, which gave us the capa¬ 
bility to move individual elements of the mix spa¬ 
tially to test performance from a wider variety 
of directions, as well as solo individual tracks. 

In general, both decoders sounded quite good, 
providing compact and directionally accurate 

6 http: / / www.cdbaby.com/cd/allette4 
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(a) 250 Hz ITD (b) 1 to 4 kHz ILD 

Figure 5: Interaural time difference (ITD) and interaural level difference (ILD) as a function of azimuth for 
full-sphere, A11RAD, and SSF-based decoders. Source elevation is 0°. 


imaging down to the horizontal limit of the play¬ 
back array. Sources below the horizon were re¬ 
produced at the horizon, fading out as they were 
panned towards the nadir. The SSF-based deco¬ 
der sounded brighter and more detailed than the 
A11RAD decoder, despite the fact that neither 
decoder used frequency-dependent decoding. It 
was also noted that with the A11RAD decoder as 
the listener leaned to the left and right, central 
sources moved in the opposite direction, whereas 
with the SSF-based decoder central sources re¬ 
mained in place. 

Neither of the test decoders sounded as good 
as the reference dual-band, full-sphere decoder, 
especially in the reproduction of lower frequency 
percussion, which lost some of its impact. This 
may be attributable to the use of correct low- 
frequency velocity decoding (ry = 1) in the ref¬ 
erence decoder vs. wideband max-re decoding 
in the test decoders. 

At the end of the listening session, we used a 
first-order SSF-based decoder to briefly audition 
a first-order Soundfield microphone recording of 
an orchestra made by one of the authors. ‘ In this 
case, the instrumental balance of the orchestra 
was incorrect; notably, the woodwinds were al¬ 
most inaudible. After the listening session, we re¬ 
called that in this recording, the microphone was 
hung vertically, approximately 3 meters behind 
and 1.5 meters above the conductor’s head, plac¬ 
ing the entire orchestra in the lower hemisphere 


' Beethoven: Sym. No. 4 in B-flat Major, Op. 60, 4th 
Mvt. Available at http://www.ambisonia.com/Members/ 
ajh/ambisonicfile. 2008-10-30.6980317146 


of the recording. The first-order SSF-based deco¬ 
der starts fading sources at approximately 20° 
above the horizon, which caused the instruments 
at the front of the orchestra to be attenuated 
significantly. At this point, we cannot recom¬ 
mend this configuration for first-order program 
material with significant sources in the lower- 
hemisphere. Possible workarounds we intend to 
try include inverting the vertical signal, Z, to 
mirror the soundfield across the Z = 0 plane or 
rotating the soundfield about the K-axis (“tilt”) 
in order to move important sources to the upper 
hemisphere. 

A11RAD decoders generated by toolbox have 
been used for performances at Stanford’s Bing 
Concert Hall and Studio employing CCRMA’s 
24-speaker, hemispherical dome, loudspeaker ar¬ 
ray. At the dress rehearsal for a performance in 
the Concert Hall, we were able to compare the 
new A11RAD decoder to the projection decoder 
that had been used for previous concerts. The 
improvement was clearly audible to all present, 
with increased clarity and directional focus, espe¬ 
cially for sources behind and above the audience. 

Good results have also been reported using 
modified inversion for a second-order decoder for 
a 12-speaker trirectangle array that is limited by 
the ceiling height of the room, leaving a large 
gap in coverage at the top and bottom of the 
array. 

5 Decoding Engine 

To support operation beyond third-order, a vari¬ 
ety of plug-in architectures, and use with third- 
party SDKs, a new Ambisonic decoder engine 


















was implemented in Faust. Faust is a DSP 
specification language, which can target a vari¬ 
ety of plug-in formats and operating systems. 

The new implementation comprises about 250 
lines of FAUST. It has no inherent limits on 
the Ambisonic order at which it operates and 
supports three modes of decoding: one decod¬ 
ing matrix with per-order gains (T), one decod¬ 
ing matrix with phase-matched shelf filters, and 
dual-band, with phased-matched bandsplitting 
filters and two decoding matrices. The outputs 
can be delay and level compensated for speak¬ 
ers at different distances from the center of the 
array. 

Nearfield compensation is supplied by digital 
state-variable realizations of Bessel filters [Smith 
2013b] and can be applied at the input or output 
of the decoder, or turned off completely. The 
current implementation provides filters for op¬ 
eration up to fifth-order, although the toolbox 
includes facilities for automatically generating 
filters up to approximately 25th order. 8 

User adjustments are supplied for overall gain 
and muting, as well as crossover frequency and 
relative levels of high and low frequencies. All 
realtime controls are “dezippered” and can be 
accessed directly through GUI elements or via 
Open Sound Control. 

In practice, the toolbox writes out the con¬ 
figuration section of the decoder and appends 
the implementation section, producing a sin¬ 
gle FAUST “dsp” hie, containing the full deco¬ 
der. The FAUST compiler (either online or lo¬ 
cal) is used to produce a highly optimized C++ 
class that implements the decoder, which is then 
wrapped in a plug-in-specific architecture hie 
that provides the interface to the various SDKs. 
This is compiled to produce the plug-in hie. At 
the time of this writing VST, AU, MaxMSP, Pd, 
LADSPA, LV2, Supercollider, and many others 
are supported on Windows, MacOSX, and Linux. 
In addition, an online compiler is available. 

The decoder engine implementation can be 
used apart from the toolbox by editing the config¬ 
uration options and inserting the per-order gains 
and matrix coefficients manually. Facilities are 
provided to generate configuration sections di¬ 
rectly from existing AmbDec configuration hies. 

6 Channel-Order, Normalization, 
and Mixed-Order Conventions 

At present, there are a number of channel-order 
and normalization conventions in use by the 

8 The limit is imposed by Matlab’s roots () function. 


Ambisonics community. The toolbox imple¬ 
ments all conventions known to the authors, in¬ 
cluding variants that adjust the gain of the om- 
nidirectiontal component (W) to be compatible 
with B format. Internally, each channel is anno¬ 
tated with its degree, order, gain relative to full 
orthonormalization (N3D), and Condon-Shortly 
phase, so additional conventions can be added 
easily, if needed. 

Two mixed-order conventions are supported by 
the toolbox: the scheme used in the AMB Ambi¬ 
sonic File Format (#H#P) [Dobson 2012] and one 
proposed by Travis [2009], which gives resolution- 
versus-elevation curves that are hatter in and 
near the horizontal plane (#H#V). 

7 Conclusions and Future Work 

We have reported on extensions to the Ambisonic 
Decoder Toolbox to handle popular loudspeaker 
configurations that do not cover the full sphere, 
such as hemispherical domes and multilevel rings. 
It also has been extended to operate at higher 
Ambisonic orders and with alternate channel or¬ 
der and normalization conventions. To support 
that, and multiple plug-in architectures, we have 
written a new, full-featured decoder in FAUST. 

In general, the ability to generate decoders 
quickly has proven valuable in performance set¬ 
tings where one has to set up quickly and the 
speakers are not necessarily installed in the 
planned locations. The other effect is that it 
places less emphasis on performance prediction 
in that a number of decoders can be generated 
with different methods and parameter settings, 
and then auditioned to determine the best one 
for a particular set of playback conditions. 

Generating dual-band decoders from these al¬ 
ternate methods is an obvious extension for the 
toolbox, as is using the decoders as initial esti¬ 
mates for the optimizer. Users have requested 
adding bass management to the decoder imple¬ 
mentation. We have also investigated hosting 
the toolbox on a server and linking directly to 
the online FAUST compiler, so that a user does 
not need to install any software to use it. 

As highlighted at the end of our listening ses¬ 
sion, a significant open question with partial- 
coverage decoders is what should happen if a 
source moves into a “poor” area, for example, 
the zenith or nadir directions. The effect of a 
Spitfire flying low overhead is probably not com¬ 
promised if it appears too loud or doesn’t have 
exact localization. Conversely, a source moving 



underground may be allowed to fade. 9 

The current implementations simply discard 
these sources, fading out as they are panned be¬ 
yond the coverage region. In the case of the 
A11RAD decoders, they can be brought out for 
further processing by simply making the imag¬ 
inary speakers into real speakers in the config¬ 
uration hie; however, these signals cannot be 
simply mixed into existing speaker feeds as the 
coherent combination of the signals will distort 
the directional fidelity of the decoder, especially 
for sources near the horizon. One proposal is to 
decorrelate them using a broadband 90° phase 
shift and sum into the speaker feeds. Other sug¬ 
gestions are welcome. 

The toolbox is open source and available under 
the GNU Affero General Public License, version 
3. The FAUST code generated by the toolbox 
is covered by the BSD 3-Clause License, so that 
it may be combined with other code without re¬ 
striction. Contact the authors to obtain a copy 
of the toolbox. 
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